An Empirical Study of a New Approach to Nearest Neighbor Searching

نویسندگان

  • Songrit Maneewongvatana
  • David M. Mount
چکیده

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, <d, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time. The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-zero probability of nearest neighbor searching

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

EFFECT OF THE NEXT-NEAREST NEIGHBOR INTERACTION ON THE ORDER-DISORDER PHASE TRANSITION

In this work, one and two-dimensional lattices are studied theoretically by a statistical mechanical approach. The nearest and next-nearest neighbor interactions are both taken into account, and the approximate thermodynamic properties of the lattices are calculated. The results of our calculations show that: (1) even though the next-nearest neighbor interaction may have an insignificant ef...

متن کامل

Probably correct k-nearest neighbor search in high dimensions

A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001